In [ ]:
#Importing necessary libraries
import pandas
import math
import pandas_datareader as web
import numpy as np
import pandas as pd
import datetime
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime
import prophet
from prophet import Prophet
from prophet.diagnostics import cross_validation
from prophet.diagnostics import performance_metrics
from prophet.plot import plot_plotly
import plotly.graph_objs as go
from plotly.subplots import make_subplots
import os
from numpy import math
import seaborn as sns
import matplotlib.pyplot as plt
from pandas.plotting import lag_plot
import plotly.graph_objects as go
from statsmodels.tsa.arima_model import ARIMA
from sklearn.metrics import mean_squared_error
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf,plot_pacf
import statsmodels.api as sm
import warnings
warnings.filterwarnings('ignore')
plt.style.use('fivethirtyeight')
#pip install pandas_datareader
In [ ]:
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook"

Stock prediction using LSTM -¶

Loading the Apple(NASDAQ: AAPL) Dataset -

In [ ]:
# Read the CSV file into a DataFrame
apple = pd.read_csv('AAPL.csv')

# Convert the 'Date' column to datetime
apple['Date'] = pd.to_datetime(apple['Date'], format='%d-%m-%Y')

# Set the 'Date' column as the index
apple.set_index('Date', inplace=True)

df = apple.copy()

# Filter the DataFrame for the date range 2015-01-01 to 2023-06-22
df = df['01-01-2015':'12-12-2022']

Explanation of the columns of the dataset -

  • Date: The date of the stock data entry, serving as the index of the dataframe.
  • Low: The lowest price of the stock for the given day.
  • Open: The price of the stock at the opening of the trading day.
  • Volume: The number of shares that were traded during the given day.
  • High: The highest price of the stock for the given day.
  • Close: The price of the stock at the closing of the trading day.
  • Adjusted Close: The closing price of the stock, adjusted for any corporate actions such as dividends, stock splits, and new stock offerings.
In [ ]:
df.head(2)
Out[ ]:
Low Open Volume High Close Adjusted Close
Date
2015-01-02 26.837500 27.8475 212818400 27.860001 27.3325 24.603201
2015-01-05 26.352501 27.0725 257142000 27.162500 26.5625 23.910093
In [ ]:
df_10 = pd.DataFrame()
df_10['Close'] = df['Close'].rolling(window=10).mean()
df_20 = pd.DataFrame()
df_20['Close'] = df['Close'].rolling(window=20).mean()
df_30 = pd.DataFrame()
df_30['Close'] = df['Close'].rolling(window=30).mean()
df_40 = pd.DataFrame()
df_40['Close'] = df['Close'].rolling(window=40).mean()
In [ ]:
#Visualize the data
plt.figure(figsize=(20,10))
plt.plot(df['Close'].tail(200), label='df')
plt.plot(df_10['Close'].tail(200), label='df_10')
plt.plot(df_20['Close'].tail(200), label='df_20')
plt.plot(df_30['Close'].tail(200), label='df_30')
plt.plot(df_40['Close'].tail(200), label='df_40')
plt.title('Apple Close Price History')
plt.xlabel('Year')
plt.ylabel('Close Price USD($)')
# Set major ticks format
# plt.gca().xaxis.set_major_locator(mdates.YearLocator(10))  # Set major ticks to appear every 5 years.
# plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y'))  # Format the ticks to display the year only.
plt.legend(loc='upper left')
plt.show()
No description has been provided for this image

Inference of the above plot -¶

The plot above is a visual representation of Apple's closing stock price along with its moving averages over different window sizes (10, 20, 30, and 40 days). Here are some inferences you can draw from the plot:

Trend Identification: The rolling averages smooth out the price data to identify the trend more clearly. The longer the window, the smoother the trend line. For example, the 40-day moving average (df_40) shows the smoothest trend, indicating a more general movement of the stock price over time compared to the actual daily closing prices (df).

Volatility: The original closing price data (df) shows more volatility with sharper rises and falls. In contrast, the rolling averages demonstrate less volatility.

Short-term vs. Long-term Trends: The shorter window moving averages (like df_10 and df_20) are closer to the actual closing prices and more responsive to daily price changes, reflecting short-term trends. The longer window averages (df_30 and df_40) lag more and are indicative of longer-term trends.

Potential Trading Signals: Crossovers of the different moving averages can be used as potential trading signals. For instance, when a shorter period moving average crosses above a longer period average, it can be seen as a bullish signal, and vice versa.

Support and Resistance Levels: Moving averages can act as support and resistance levels. The stock price seems to bounce off the moving average lines, especially the longer ones, indicating potential support or resistance areas.

Time Frame: The plot covers the latter part of the year 2022. The stock experienced some peaks and troughs, with the most notable peak around mid-2022 and a downward trend towards the end of 2022.

In [ ]:
# Create a new dataframe with only the 'Close column
data = df.filter(['Close'])
data.head()
Out[ ]:
Close
Date
2015-01-02 27.332500
2015-01-05 26.562500
2015-01-06 26.565001
2015-01-07 26.937500
2015-01-08 27.972500
In [ ]:
#Convert the dataframe to a numpy array
dataset = data.values
In [ ]:
#Get the number of rows to train the model on
training_data_len = math.ceil(len(dataset)*.8)
training_data_len
Out[ ]:
1601

Data Scaling -¶

Before feeding the data into the LSTM model, we normalize the stock prices using Min-Max Scaling. This technique scales the dataset so that all the input features lie between 0 and 1 inclusive, which helps in speeding up the convergence of stochastic gradient descent. Normalization is a common preprocessing step for neural network algorithms because it allows the model to train more efficiently.

In [ ]:
# Scale the data
scaler = MinMaxScaler()#feature_range=(0,1))
scaled_data = scaler.fit_transform(dataset)

scaled_data, scaled_data.shape
Out[ ]:
(array([[0.0297789 ],
        [0.02494904],
        [0.02496473],
        ...,
        [0.75311274],
        [0.75003925],
        [0.75104288]]),
 (2001, 1))
In [ ]:
# Create the training data set
train_data = scaled_data[0:training_data_len,:]
## Split the data into train and test
x_train = []
y_train = []

for i in range(60, len(train_data)):
  x_train.append(train_data[i-60:i,0])
  y_train.append(train_data[i,0])
  if i<=61:
    print(x_train)
    print(y_train)
    print()
print(len(x_train))
print(len(y_train))
[array([0.0297789 , 0.02494904, 0.02496473, 0.02730125, 0.03379333,
       0.03398151, 0.02965345, 0.03117454, 0.03051593, 0.02584288,
       0.02454132, 0.02882234, 0.03012389, 0.03459308, 0.0355026 ,
       0.03569077, 0.02948095, 0.03915635, 0.04478596, 0.04205741,
       0.04436256, 0.04439393, 0.04582092, 0.04641682, 0.044833  ,
       0.04607183, 0.04967853, 0.0541634 , 0.05664106, 0.05761331,
       0.05878941, 0.06018505, 0.05976165, 0.06140819, 0.06689667,
       0.06559511, 0.06029481, 0.06285087, 0.05977734, 0.06076525,
       0.06118866, 0.05990277, 0.05656266, 0.0568606 , 0.05770739,
       0.0535832 , 0.05002353, 0.0534891 , 0.05214051, 0.05427317,
       0.05755058, 0.05979302, 0.05827192, 0.05576291, 0.05781716,
       0.05700174, 0.0518112 , 0.0531598 , 0.05160735, 0.05649993])]
[0.053457746504823156]

[array([0.0297789 , 0.02494904, 0.02496473, 0.02730125, 0.03379333,
       0.03398151, 0.02965345, 0.03117454, 0.03051593, 0.02584288,
       0.02454132, 0.02882234, 0.03012389, 0.03459308, 0.0355026 ,
       0.03569077, 0.02948095, 0.03915635, 0.04478596, 0.04205741,
       0.04436256, 0.04439393, 0.04582092, 0.04641682, 0.044833  ,
       0.04607183, 0.04967853, 0.0541634 , 0.05664106, 0.05761331,
       0.05878941, 0.06018505, 0.05976165, 0.06140819, 0.06689667,
       0.06559511, 0.06029481, 0.06285087, 0.05977734, 0.06076525,
       0.06118866, 0.05990277, 0.05656266, 0.0568606 , 0.05770739,
       0.0535832 , 0.05002353, 0.0534891 , 0.05214051, 0.05427317,
       0.05755058, 0.05979302, 0.05827192, 0.05576291, 0.05781716,
       0.05700174, 0.0518112 , 0.0531598 , 0.05160735, 0.05649993]), array([0.02494904, 0.02496473, 0.02730125, 0.03379333, 0.03398151,
       0.02965345, 0.03117454, 0.03051593, 0.02584288, 0.02454132,
       0.02882234, 0.03012389, 0.03459308, 0.0355026 , 0.03569077,
       0.02948095, 0.03915635, 0.04478596, 0.04205741, 0.04436256,
       0.04439393, 0.04582092, 0.04641682, 0.044833  , 0.04607183,
       0.04967853, 0.0541634 , 0.05664106, 0.05761331, 0.05878941,
       0.06018505, 0.05976165, 0.06140819, 0.06689667, 0.06559511,
       0.06029481, 0.06285087, 0.05977734, 0.06076525, 0.06118866,
       0.05990277, 0.05656266, 0.0568606 , 0.05770739, 0.0535832 ,
       0.05002353, 0.0534891 , 0.05214051, 0.05427317, 0.05755058,
       0.05979302, 0.05827192, 0.05576291, 0.05781716, 0.05700174,
       0.0518112 , 0.0531598 , 0.05160735, 0.05649993, 0.05345775])]
[0.053457746504823156, 0.0531754816305183]

1541
1541
In [ ]:
#Conver the x_train and y_train to numpy array
x_train, y_train = np.array(x_train), np.array(y_train)
In [ ]:
#Reshape the data
x_train = np.reshape(x_train, (x_train.shape[0],x_train.shape[1],1))
x_train.shape
Out[ ]:
(1541, 60, 1)

LSTM Model Architecture -¶

Here we construct the architecture of our LSTM (Long Short-Term Memory) model. The LSTM is designed to recognize patterns in sequences of data, making it ideal for time series forecasting like stock price predictions.

In [ ]:
# Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape=(x_train.shape[1],1)))
model.add(LSTM(50, return_sequences=False))
model.add(Dense(25))
model.add(Dense(1))
In [ ]:
model.compile(optimizer='adam', loss='mean_squared_error')

Training the model -¶

In [ ]:
# Train the model
model.fit(x_train, y_train, batch_size=1, epochs=5)
Epoch 1/5
1541/1541 [==============================] - 23s 14ms/step - loss: 0.0011
Epoch 2/5
1541/1541 [==============================] - 22s 14ms/step - loss: 4.8845e-04
Epoch 3/5
1541/1541 [==============================] - 22s 14ms/step - loss: 4.1697e-04
Epoch 4/5
1541/1541 [==============================] - 22s 14ms/step - loss: 3.3086e-04
Epoch 5/5
1541/1541 [==============================] - 22s 14ms/step - loss: 2.9121e-04
Out[ ]:
<keras.callbacks.History at 0x23dda877430>
In [ ]:
## Create the testing data set
# Create a new array containing scaled vlues from index 1543 to 2003
test_data = scaled_data[training_data_len-60: , :]
#Create the data sets x_test and y_test
x_test = []
y_test = dataset[training_data_len:, : ]
for i in range(60, len(test_data)):
  x_test.append(test_data[i-60:i, 0])
In [ ]:
# Convert the data to a numpy array
x_test = np.array(x_test)
In [ ]:
# Reshape the data
x_test = np.reshape(x_test,(x_test.shape[0],x_test.shape[1],1))
In [ ]:
# Get the model predicted price values
predictions = model.predict(x_test)
predictions = scaler.inverse_transform(predictions)
In [ ]:
print(predictions[0:5], y_test[0:5])
[[122.95017]
 [123.74193]
 [126.0987 ]
 [126.08054]
 [124.70234]] [[124.97000122]
 [127.44999695]
 [126.26999664]
 [124.84999847]
 [124.69000244]]

Calculating MSE and RMSE -¶

To evaluate the performance of our LSTM model, we calculate two common metrics: Mean Squared Error (MSE) and Root Mean Squared Error (RMSE).

In [ ]:
# Get the MSE & RMSE
mse = np.mean(predictions-y_test)**2
rmse = np.sqrt(np.mean(predictions-y_test)**2)
mse, rmse
Out[ ]:
(10.638198038772586, 3.261625061035156)

Plotting the predictions -¶

In [ ]:
# Plot the data
train = data[:training_data_len]
valid = data[training_data_len:]
valid['Predictions'] = predictions
# Visualize the data
plt.figure(figsize=(16,8))
plt.title('Model')
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close Price USD ($)', fontsize=18)
plt.plot(train['Close'])
plt.plot(valid[['Close','Predictions']])
plt.legend(['True', 'Val', 'Predictions'], loc = 'lower right')
plt.show()
No description has been provided for this image

Inference for the above plot -¶

Historical and Predicted Data: The blue line represents the true historical closing prices of the stock up until a certain point, where the forecasting begins. The orange line appears to show the model's predictions for the closing price, overlapping with the validation set (in red) that seems to be actual prices not seen by the model during training.

Model Performance: The LSTM model has learned the underlying patterns in the historical data well enough to predict the future values that follow the trends of the actual stock prices reasonably closely. There's an evident correlation between the predicted values (orange) and the actual stock prices (red) in the validation set.

Trends and Volatility: The model captures both the overall upward trend and the intermediate fluctuations (volatility) of the stock prices over time. However, the exact peaks and troughs are not perfectly aligned, which is common in stock price predictions due to the complex and sometimes unpredictable nature of financial markets.

Forecast Horizon: The model appears to be making short-term forecasts as the predictions follow the actual prices closely. Long-term forecasts often diverge more from the actual prices due to accumulating prediction errors and the inherent uncertainty in the market.

Generalization: The LSTM seems to generalize well to unseen data, as indicated by the predictions following the validation set trend.

In conclusion, while the model appears to perform well, it is essential to consider the potential for overfitting and the fact that past performance is not always indicative of future results, especially in the stock market where many unforeseen factors can affect prices.

Stock Price Prediction -¶

This code block is designed to predict Apple Inc.'s closing stock price for the next day after the last date in the dataset provided. Here's a step-by-step breakdown of the process:

In [ ]:
# Get the quote
apple_quote = apple['01-01-2015':'12-05-2022']
#Create a new dataframe
new_df = apple_quote.filter(['Close'])
# Get the last 60 day closing price values and convert the dataframe to an array
last_60_days = new_df[-60:].values
#Scale the data to be values between 0 and 1
last_60_days_scaled = scaler.transform(last_60_days)
#Create an empty list
X_test = []
#Append the past 60 day 
X_test.append(last_60_days_scaled)
#Convert the X_test data set to a numpy array
X_test = np.array(X_test)
#Reshape the data
X_test = np.reshape(X_test, (X_test.shape[0],X_test.shape[1],1))
# Get the predicted scaled price
pred_price = model.predict(X_test)
#Undo the scaling
pred_price = scaler.inverse_transform(pred_price)
print(pred_price)
[[144.67435]]
In [ ]:
# Get the quote
apple_quote2 = apple['13-05-2022':'13-05-2022']
apple_quote2
Out[ ]:
Low Open Volume High Close Adjusted Close
Date
2022-05-13 143.110001 144.589996 113990900 148.100006 147.110001 146.662643

Stock price prediction using Prophet -¶

In [ ]:
df = apple.copy()

# Filter the DataFrame for the date range 2015-01-01 to 2023-06-22
df = df['01-01-2015':'12-12-2022']
In [ ]:
dir(prophet)
Out[ ]:
['Path',
 'Prophet',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 'about',
 'diagnostics',
 'f',
 'forecaster',
 'here',
 'make_holidays',
 'models',
 'plot']
In [ ]:
# To filter negative values
df.index[df['Close'] < 0]
Out[ ]:
DatetimeIndex([], dtype='datetime64[ns]', name='Date', freq=None)
In [ ]:
plt.figure(figsize=(16,8))
plt.plot(df['Close'])
plt.title('Trend')
plt.xlabel('Year', fontsize=18)
plt.ylabel('Close Price USD ($)', fontsize=18)
plt.show()
No description has been provided for this image

Data Preparation for Time Series Forecasting -¶

Differencing: We create a new column 'Close_shift' in the DataFrame df that contains the difference between each closing price and the previous day's closing price (y(t) - y(t-1)). This is achieved by subtracting the Close column with a version of itself that is shifted by one time step.

Stationarity: By differencing the data, we aim to make the time series stationary, meaning its statistical properties like mean and variance do not change over time, which is an assumption required by many time series forecasting models.

Resulting Series (y): The resulting differenced series y is typically used as the target variable for forecasting models, as it often leads to better model performance and more accurate forecasts.

In [ ]:
# yt = yt-y(t-1)
df['Close_shift'] = df['Close'] - df['Close'].shift(1)
y = df['Close_shift']
In [ ]:
df.head(2)
Out[ ]:
Low Open Volume High Close Adjusted Close Close_shift
Date
2015-01-02 26.837500 27.8475 212818400 27.860001 27.3325 24.603201 NaN
2015-01-05 26.352501 27.0725 257142000 27.162500 26.5625 23.910093 -0.77
In [ ]:
plt.figure(figsize=(16,8))
plt.plot(df['Close_shift'])
plt.title('Stationary data')
plt.xlabel('Year', fontsize=18)
plt.show()
No description has been provided for this image

Inference for the above plot -¶

Stationarity: The differenced data appears to hover around a mean of zero without any clear long-term upward or downward trend, suggesting that the differencing process has helped to stabilize the mean of the time series over the period shown.

Volatility: Despite the removal of the trend, there are still periods of noticeable volatility where the magnitude of the price changes increases, particularly towards the latter part of the time series. This could be reflective of real-world events impacting the stock prices.

Fluctuations: There are spikes throughout the series, indicating days with larger changes in price from one day to the next. These could be due to market events, earnings reports, product launches, or other news related to Apple Inc.

Random Walk: The plot exhibits characteristics of a random walk, with price changes appearing to be random and not exhibiting any seasonal or cyclical patterns. This is typical of differenced stock market data.

Outliers: There are several sharp spikes that stand out from the general 'noise' of the plot, suggesting particularly volatile days.

Modeling Consideration: The lack of trend or seasonality in this stationary data suggests that simpler models might be used to predict future price changes. However, the apparent noise and outliers indicate that any model would need to account for volatility and possible non-normal distribution of changes.

Now, the above graph looks stationary

Initializing the model -¶

In [ ]:
model = Prophet()

Paremeters

  • growth: linear/logistic
  • seasonality:additive/multiplicative
  • changepoint
In [ ]:
df = df.reset_index()
df = df.filter(['Date','Close','Close_shift'])
In [ ]:
df.columns
Out[ ]:
Index(['Date', 'Close', 'Close_shift'], dtype='object')
In [ ]:
## Rename the columns as ds and y
df_pht = df.rename(columns={'Date':'ds', 'Close_shift':'y'})#, inplace=True)
In [ ]:
df_pht.head()
Out[ ]:
ds Close y
0 2015-01-02 27.332500 NaN
1 2015-01-05 26.562500 -0.770000
2 2015-01-06 26.565001 0.002501
3 2015-01-07 26.937500 0.372499
4 2015-01-08 27.972500 1.035000

Splitting the data -¶

In [ ]:
## Split the Data
train_data_len = int(0.8*len(df))
train_data_len
Out[ ]:
1600
In [ ]:
df_train = df_pht[:train_data_len]
df_test = df_pht[train_data_len:]
df_train.head(), df_train.shape
Out[ ]:
(          ds      Close         y
 0 2015-01-02  27.332500       NaN
 1 2015-01-05  26.562500 -0.770000
 2 2015-01-06  26.565001  0.002501
 3 2015-01-07  26.937500  0.372499
 4 2015-01-08  27.972500  1.035000,
 (1600, 3))

Fitting our model to our data -¶

In [ ]:
model.fit(df_train)
01:21:24 - cmdstanpy - INFO - Chain [1] start processing
01:21:24 - cmdstanpy - INFO - Chain [1] done processing
Out[ ]:
<prophet.forecaster.Prophet at 0x23dda88fac0>
In [ ]:
periods = len(df)-len(df_train)
periods
Out[ ]:
401
In [ ]:
# Create Future Datas of len(df)-len(df_train) days
future_dates = model.make_future_dataframe(periods=periods)
In [ ]:
# Shape after adding 401 days
future_dates.shape
Out[ ]:
(2000, 1)
In [ ]:
future_dates.head()
Out[ ]:
ds
0 2015-01-05
1 2015-01-06
2 2015-01-07
3 2015-01-08
4 2015-01-09
In [ ]:
# Make Prediction 
prediction = model.predict(future_dates)
In [ ]:
prediction.head()
Out[ ]:
ds trend yhat_lower yhat_upper trend_lower trend_upper additive_terms additive_terms_lower additive_terms_upper weekly weekly_lower weekly_upper yearly yearly_lower yearly_upper multiplicative_terms multiplicative_terms_lower multiplicative_terms_upper yhat
0 2015-01-05 -0.148975 -1.568756 1.798069 -0.148975 -0.148975 0.215167 0.215167 0.215167 0.263429 0.263429 0.263429 -0.048261 -0.048261 -0.048261 0.0 0.0 0.0 0.066193
1 2015-01-06 -0.148900 -1.690080 1.556151 -0.148900 -0.148900 0.091384 0.091384 0.091384 0.132852 0.132852 0.132852 -0.041467 -0.041467 -0.041467 0.0 0.0 0.0 -0.057515
2 2015-01-07 -0.148825 -1.597548 1.674854 -0.148825 -0.148825 0.132297 0.132297 0.132297 0.164991 0.164991 0.164991 -0.032694 -0.032694 -0.032694 0.0 0.0 0.0 -0.016528
3 2015-01-08 -0.148750 -1.830930 1.595346 -0.148750 -0.148750 0.018352 0.018352 0.018352 0.040505 0.040505 0.040505 -0.022153 -0.022153 -0.022153 0.0 0.0 0.0 -0.130398
4 2015-01-09 -0.148675 -1.880065 1.589826 -0.148675 -0.148675 0.009091 0.009091 0.009091 0.019189 0.019189 0.019189 -0.010098 -0.010098 -0.010098 0.0 0.0 0.0 -0.139584

Narrative

  • yhat : the predicted forecast

  • yhat_lower : the lower border of the prediction

  • yhat_upper: the upper border of the prediction

In [ ]:
print(prediction['ds'].dtype)
datetime64[ns]
In [ ]:
plot_plotly(model, prediction)

Interpretation of the above plot -¶

Central Trend: The model's prediction line appears to hover close to zero throughout the years, suggesting that there is no strong long-term trend in the day-to-day price changes.

Periods of Stability: In the earlier years, particularly from 2015 to around 2019, the actual data points are closely packed around the zero line, indicating less volatility and relatively stable day-to-day price changes.

Increasing Volatility: Starting from around 2020 onwards, there is a noticeable increase in the spread of data points on the y-axis, indicating higher volatility in the stock price changes.

Extreme Changes: The occurrence of data points that reach the extremes of the y-axis, especially those above +5 or below -5, suggests days with significant price movements, which could be due to market shocks, earnings reports, product announcements, or global economic events.

Consistency with Historical Events: The increased volatility in the later years could correlate with real-world events affecting the financial markets, such as economic uncertainty, changes in technology, or shifts in consumer behavior that could impact Apple's stock specifically.

Narrative

  • A Trending data

  • Black dots : the actual data points in our dataset.

  • Deep blue line : the predicted forecast/the predicted values

  • Light blue line : the boundaries

In [ ]:
from prophet.plot import plot_components_plotly
fig = plot_components_plotly(model, prediction)
fig.show()

Inference for the above plot -¶

Trend: The first plot shows a generally downward trend over the years from 2015 to 2022, suggesting a decrease in the daily stock price changes over time.

Yearly Seasonality: The second plot indicates a clear yearly pattern with certain times of the year, like mid-year and year-end, experiencing more pronounced fluctuations.

Weekly Seasonality: The third plot reveals weekly seasonality with variations within the week, showing that certain days of the week consistently have different price change patterns, with the weekend showing a steeper decline.

Cross Validation

  • For measuring forecast error by comparing the predicted values with the actual values

  • initial:the size of the initial training period

  • period : the spacing between cutoff dates

  • horizon : the forecast horizon((ds minus cutoff)

  • By default, the initial training period is set to three times the horizon, and cutoffs are made every half a horizon

In [ ]:
df_train.shape, len(df)-len(df_train)
Out[ ]:
((1600, 3), 401)
In [ ]:
cv = cross_validation(model, initial='1600 days', period='3 days', horizon = '401 days')
print("Cross Validation Done")
  0%|          | 0/106 [00:00<?, ?it/s]01:21:27 - cmdstanpy - INFO - Chain [1] start processing
01:21:27 - cmdstanpy - INFO - Chain [1] done processing
  1%|          | 1/106 [00:00<00:33,  3.10it/s]01:21:28 - cmdstanpy - INFO - Chain [1] start processing
01:21:28 - cmdstanpy - INFO - Chain [1] done processing
  2%|▏         | 2/106 [00:00<00:32,  3.16it/s]01:21:28 - cmdstanpy - INFO - Chain [1] start processing
01:21:28 - cmdstanpy - INFO - Chain [1] done processing
  3%|▎         | 3/106 [00:00<00:31,  3.24it/s]01:21:28 - cmdstanpy - INFO - Chain [1] start processing
01:21:28 - cmdstanpy - INFO - Chain [1] done processing
  4%|▍         | 4/106 [00:01<00:32,  3.14it/s]01:21:28 - cmdstanpy - INFO - Chain [1] start processing
01:21:29 - cmdstanpy - INFO - Chain [1] done processing
  5%|▍         | 5/106 [00:01<00:32,  3.12it/s]01:21:29 - cmdstanpy - INFO - Chain [1] start processing
01:21:29 - cmdstanpy - INFO - Chain [1] done processing
  6%|▌         | 6/106 [00:01<00:31,  3.14it/s]01:21:29 - cmdstanpy - INFO - Chain [1] start processing
01:21:29 - cmdstanpy - INFO - Chain [1] done processing
  7%|▋         | 7/106 [00:02<00:31,  3.15it/s]01:21:29 - cmdstanpy - INFO - Chain [1] start processing
01:21:30 - cmdstanpy - INFO - Chain [1] done processing
  8%|▊         | 8/106 [00:02<00:31,  3.15it/s]01:21:30 - cmdstanpy - INFO - Chain [1] start processing
01:21:30 - cmdstanpy - INFO - Chain [1] done processing
  8%|▊         | 9/106 [00:02<00:30,  3.13it/s]01:21:30 - cmdstanpy - INFO - Chain [1] start processing
01:21:30 - cmdstanpy - INFO - Chain [1] done processing
  9%|▉         | 10/106 [00:03<00:30,  3.18it/s]01:21:30 - cmdstanpy - INFO - Chain [1] start processing
01:21:30 - cmdstanpy - INFO - Chain [1] done processing
 10%|█         | 11/106 [00:03<00:30,  3.15it/s]01:21:31 - cmdstanpy - INFO - Chain [1] start processing
01:21:31 - cmdstanpy - INFO - Chain [1] done processing
 11%|█▏        | 12/106 [00:03<00:29,  3.16it/s]01:21:31 - cmdstanpy - INFO - Chain [1] start processing
01:21:31 - cmdstanpy - INFO - Chain [1] done processing
 12%|█▏        | 13/106 [00:04<00:29,  3.16it/s]01:21:31 - cmdstanpy - INFO - Chain [1] start processing
01:21:31 - cmdstanpy - INFO - Chain [1] done processing
 13%|█▎        | 14/106 [00:04<00:29,  3.13it/s]01:21:32 - cmdstanpy - INFO - Chain [1] start processing
01:21:32 - cmdstanpy - INFO - Chain [1] done processing
 14%|█▍        | 15/106 [00:04<00:29,  3.13it/s]01:21:32 - cmdstanpy - INFO - Chain [1] start processing
01:21:32 - cmdstanpy - INFO - Chain [1] done processing
 15%|█▌        | 16/106 [00:05<00:29,  3.07it/s]01:21:32 - cmdstanpy - INFO - Chain [1] start processing
01:21:32 - cmdstanpy - INFO - Chain [1] done processing
 16%|█▌        | 17/106 [00:05<00:28,  3.10it/s]01:21:33 - cmdstanpy - INFO - Chain [1] start processing
01:21:33 - cmdstanpy - INFO - Chain [1] done processing
 17%|█▋        | 18/106 [00:05<00:28,  3.10it/s]01:21:33 - cmdstanpy - INFO - Chain [1] start processing
01:21:33 - cmdstanpy - INFO - Chain [1] done processing
 18%|█▊        | 19/106 [00:06<00:28,  3.09it/s]01:21:33 - cmdstanpy - INFO - Chain [1] start processing
01:21:33 - cmdstanpy - INFO - Chain [1] done processing
 19%|█▉        | 20/106 [00:06<00:28,  3.04it/s]01:21:34 - cmdstanpy - INFO - Chain [1] start processing
01:21:34 - cmdstanpy - INFO - Chain [1] done processing
 20%|█▉        | 21/106 [00:06<00:28,  3.01it/s]01:21:34 - cmdstanpy - INFO - Chain [1] start processing
01:21:34 - cmdstanpy - INFO - Chain [1] done processing
 21%|██        | 22/106 [00:07<00:31,  2.65it/s]01:21:34 - cmdstanpy - INFO - Chain [1] start processing
01:21:35 - cmdstanpy - INFO - Chain [1] done processing
 22%|██▏       | 23/106 [00:07<00:30,  2.72it/s]01:21:35 - cmdstanpy - INFO - Chain [1] start processing
01:21:35 - cmdstanpy - INFO - Chain [1] done processing
 23%|██▎       | 24/106 [00:07<00:29,  2.76it/s]01:21:35 - cmdstanpy - INFO - Chain [1] start processing
01:21:35 - cmdstanpy - INFO - Chain [1] done processing
 24%|██▎       | 25/106 [00:08<00:28,  2.81it/s]01:21:36 - cmdstanpy - INFO - Chain [1] start processing
01:21:36 - cmdstanpy - INFO - Chain [1] done processing
 25%|██▍       | 26/106 [00:08<00:27,  2.86it/s]01:21:36 - cmdstanpy - INFO - Chain [1] start processing
01:21:36 - cmdstanpy - INFO - Chain [1] done processing
 25%|██▌       | 27/106 [00:09<00:37,  2.11it/s]01:21:37 - cmdstanpy - INFO - Chain [1] start processing
01:21:37 - cmdstanpy - INFO - Chain [1] done processing
 26%|██▋       | 28/106 [00:09<00:34,  2.28it/s]01:21:37 - cmdstanpy - INFO - Chain [1] start processing
01:21:37 - cmdstanpy - INFO - Chain [1] done processing
 27%|██▋       | 29/106 [00:10<00:31,  2.46it/s]01:21:37 - cmdstanpy - INFO - Chain [1] start processing
01:21:37 - cmdstanpy - INFO - Chain [1] done processing
 28%|██▊       | 30/106 [00:10<00:29,  2.60it/s]01:21:38 - cmdstanpy - INFO - Chain [1] start processing
01:21:38 - cmdstanpy - INFO - Chain [1] done processing
 29%|██▉       | 31/106 [00:10<00:27,  2.71it/s]01:21:38 - cmdstanpy - INFO - Chain [1] start processing
01:21:38 - cmdstanpy - INFO - Chain [1] done processing
 30%|███       | 32/106 [00:11<00:26,  2.81it/s]01:21:38 - cmdstanpy - INFO - Chain [1] start processing
01:21:38 - cmdstanpy - INFO - Chain [1] done processing
 31%|███       | 33/106 [00:11<00:25,  2.85it/s]01:21:39 - cmdstanpy - INFO - Chain [1] start processing
01:21:39 - cmdstanpy - INFO - Chain [1] done processing
 32%|███▏      | 34/106 [00:11<00:24,  2.93it/s]01:21:39 - cmdstanpy - INFO - Chain [1] start processing
01:21:39 - cmdstanpy - INFO - Chain [1] done processing
 33%|███▎      | 35/106 [00:12<00:23,  3.01it/s]01:21:39 - cmdstanpy - INFO - Chain [1] start processing
01:21:39 - cmdstanpy - INFO - Chain [1] done processing
 34%|███▍      | 36/106 [00:12<00:22,  3.08it/s]01:21:40 - cmdstanpy - INFO - Chain [1] start processing
01:21:40 - cmdstanpy - INFO - Chain [1] done processing
 35%|███▍      | 37/106 [00:12<00:22,  3.03it/s]01:21:40 - cmdstanpy - INFO - Chain [1] start processing
01:21:40 - cmdstanpy - INFO - Chain [1] done processing
 36%|███▌      | 38/106 [00:12<00:22,  3.05it/s]01:21:40 - cmdstanpy - INFO - Chain [1] start processing
01:21:40 - cmdstanpy - INFO - Chain [1] done processing
 37%|███▋      | 39/106 [00:13<00:22,  3.02it/s]01:21:41 - cmdstanpy - INFO - Chain [1] start processing
01:21:41 - cmdstanpy - INFO - Chain [1] done processing
 38%|███▊      | 40/106 [00:13<00:21,  3.04it/s]01:21:41 - cmdstanpy - INFO - Chain [1] start processing
01:21:41 - cmdstanpy - INFO - Chain [1] done processing
 39%|███▊      | 41/106 [00:13<00:21,  3.04it/s]01:21:41 - cmdstanpy - INFO - Chain [1] start processing
01:21:41 - cmdstanpy - INFO - Chain [1] done processing
 40%|███▉      | 42/106 [00:14<00:21,  3.03it/s]01:21:42 - cmdstanpy - INFO - Chain [1] start processing
01:21:42 - cmdstanpy - INFO - Chain [1] done processing
 41%|████      | 43/106 [00:14<00:20,  3.00it/s]01:21:42 - cmdstanpy - INFO - Chain [1] start processing
01:21:42 - cmdstanpy - INFO - Chain [1] done processing
 42%|████▏     | 44/106 [00:15<00:21,  2.89it/s]01:21:42 - cmdstanpy - INFO - Chain [1] start processing
01:21:42 - cmdstanpy - INFO - Chain [1] done processing
 42%|████▏     | 45/106 [00:15<00:20,  2.91it/s]01:21:43 - cmdstanpy - INFO - Chain [1] start processing
01:21:43 - cmdstanpy - INFO - Chain [1] done processing
 43%|████▎     | 46/106 [00:15<00:20,  2.91it/s]01:21:43 - cmdstanpy - INFO - Chain [1] start processing
01:21:43 - cmdstanpy - INFO - Chain [1] done processing
 44%|████▍     | 47/106 [00:16<00:20,  2.89it/s]01:21:43 - cmdstanpy - INFO - Chain [1] start processing
01:21:43 - cmdstanpy - INFO - Chain [1] done processing
 45%|████▌     | 48/106 [00:16<00:20,  2.89it/s]01:21:44 - cmdstanpy - INFO - Chain [1] start processing
01:21:44 - cmdstanpy - INFO - Chain [1] done processing
 46%|████▌     | 49/106 [00:16<00:19,  2.94it/s]01:21:44 - cmdstanpy - INFO - Chain [1] start processing
01:21:44 - cmdstanpy - INFO - Chain [1] done processing
 47%|████▋     | 50/106 [00:17<00:19,  2.85it/s]01:21:44 - cmdstanpy - INFO - Chain [1] start processing
01:21:44 - cmdstanpy - INFO - Chain [1] done processing
 48%|████▊     | 51/106 [00:17<00:19,  2.87it/s]01:21:45 - cmdstanpy - INFO - Chain [1] start processing
01:21:45 - cmdstanpy - INFO - Chain [1] done processing
 49%|████▉     | 52/106 [00:17<00:18,  2.88it/s]01:21:45 - cmdstanpy - INFO - Chain [1] start processing
01:21:45 - cmdstanpy - INFO - Chain [1] done processing
 50%|█████     | 53/106 [00:18<00:17,  2.96it/s]01:21:45 - cmdstanpy - INFO - Chain [1] start processing
01:21:45 - cmdstanpy - INFO - Chain [1] done processing
 51%|█████     | 54/106 [00:18<00:17,  2.98it/s]01:21:46 - cmdstanpy - INFO - Chain [1] start processing
01:21:46 - cmdstanpy - INFO - Chain [1] done processing
 52%|█████▏    | 55/106 [00:18<00:17,  2.97it/s]01:21:46 - cmdstanpy - INFO - Chain [1] start processing
01:21:46 - cmdstanpy - INFO - Chain [1] done processing
 53%|█████▎    | 56/106 [00:19<00:16,  2.99it/s]01:21:46 - cmdstanpy - INFO - Chain [1] start processing
01:21:46 - cmdstanpy - INFO - Chain [1] done processing
 54%|█████▍    | 57/106 [00:19<00:16,  3.01it/s]01:21:47 - cmdstanpy - INFO - Chain [1] start processing
01:21:47 - cmdstanpy - INFO - Chain [1] done processing
 55%|█████▍    | 58/106 [00:19<00:16,  2.97it/s]01:21:47 - cmdstanpy - INFO - Chain [1] start processing
01:21:47 - cmdstanpy - INFO - Chain [1] done processing
 56%|█████▌    | 59/106 [00:20<00:15,  2.97it/s]01:21:47 - cmdstanpy - INFO - Chain [1] start processing
01:21:47 - cmdstanpy - INFO - Chain [1] done processing
 57%|█████▋    | 60/106 [00:20<00:15,  2.92it/s]01:21:48 - cmdstanpy - INFO - Chain [1] start processing
01:21:48 - cmdstanpy - INFO - Chain [1] done processing
 58%|█████▊    | 61/106 [00:20<00:15,  2.96it/s]01:21:48 - cmdstanpy - INFO - Chain [1] start processing
01:21:48 - cmdstanpy - INFO - Chain [1] done processing
 58%|█████▊    | 62/106 [00:21<00:14,  2.98it/s]01:21:48 - cmdstanpy - INFO - Chain [1] start processing
01:21:48 - cmdstanpy - INFO - Chain [1] done processing
 59%|█████▉    | 63/106 [00:21<00:14,  3.03it/s]01:21:49 - cmdstanpy - INFO - Chain [1] start processing
01:21:49 - cmdstanpy - INFO - Chain [1] done processing
 60%|██████    | 64/106 [00:21<00:14,  2.95it/s]01:21:49 - cmdstanpy - INFO - Chain [1] start processing
01:21:49 - cmdstanpy - INFO - Chain [1] done processing
 61%|██████▏   | 65/106 [00:22<00:13,  2.99it/s]01:21:49 - cmdstanpy - INFO - Chain [1] start processing
01:21:49 - cmdstanpy - INFO - Chain [1] done processing
 62%|██████▏   | 66/106 [00:22<00:13,  2.95it/s]01:21:50 - cmdstanpy - INFO - Chain [1] start processing
01:21:50 - cmdstanpy - INFO - Chain [1] done processing
 63%|██████▎   | 67/106 [00:22<00:13,  2.97it/s]01:21:50 - cmdstanpy - INFO - Chain [1] start processing
01:21:50 - cmdstanpy - INFO - Chain [1] done processing
 64%|██████▍   | 68/106 [00:23<00:12,  3.01it/s]01:21:50 - cmdstanpy - INFO - Chain [1] start processing
01:21:50 - cmdstanpy - INFO - Chain [1] done processing
 65%|██████▌   | 69/106 [00:23<00:12,  3.01it/s]01:21:51 - cmdstanpy - INFO - Chain [1] start processing
01:21:51 - cmdstanpy - INFO - Chain [1] done processing
 66%|██████▌   | 70/106 [00:23<00:11,  3.06it/s]01:21:51 - cmdstanpy - INFO - Chain [1] start processing
01:21:51 - cmdstanpy - INFO - Chain [1] done processing
 67%|██████▋   | 71/106 [00:24<00:11,  3.11it/s]01:21:51 - cmdstanpy - INFO - Chain [1] start processing
01:21:51 - cmdstanpy - INFO - Chain [1] done processing
 68%|██████▊   | 72/106 [00:24<00:10,  3.11it/s]01:21:52 - cmdstanpy - INFO - Chain [1] start processing
01:21:52 - cmdstanpy - INFO - Chain [1] done processing
 69%|██████▉   | 73/106 [00:24<00:10,  3.08it/s]01:21:52 - cmdstanpy - INFO - Chain [1] start processing
01:21:52 - cmdstanpy - INFO - Chain [1] done processing
 70%|██████▉   | 74/106 [00:25<00:10,  3.04it/s]01:21:52 - cmdstanpy - INFO - Chain [1] start processing
01:21:52 - cmdstanpy - INFO - Chain [1] done processing
 71%|███████   | 75/106 [00:25<00:10,  3.09it/s]01:21:53 - cmdstanpy - INFO - Chain [1] start processing
01:21:53 - cmdstanpy - INFO - Chain [1] done processing
 72%|███████▏  | 76/106 [00:25<00:09,  3.06it/s]01:21:53 - cmdstanpy - INFO - Chain [1] start processing
01:21:53 - cmdstanpy - INFO - Chain [1] done processing
 73%|███████▎  | 77/106 [00:26<00:09,  3.02it/s]01:21:53 - cmdstanpy - INFO - Chain [1] start processing
01:21:53 - cmdstanpy - INFO - Chain [1] done processing
 74%|███████▎  | 78/106 [00:26<00:09,  2.99it/s]01:21:54 - cmdstanpy - INFO - Chain [1] start processing
01:21:54 - cmdstanpy - INFO - Chain [1] done processing
 75%|███████▍  | 79/106 [00:26<00:09,  2.83it/s]01:21:54 - cmdstanpy - INFO - Chain [1] start processing
01:21:54 - cmdstanpy - INFO - Chain [1] done processing
 75%|███████▌  | 80/106 [00:27<00:09,  2.85it/s]01:21:54 - cmdstanpy - INFO - Chain [1] start processing
01:21:54 - cmdstanpy - INFO - Chain [1] done processing
 76%|███████▋  | 81/106 [00:27<00:08,  2.86it/s]01:21:55 - cmdstanpy - INFO - Chain [1] start processing
01:21:55 - cmdstanpy - INFO - Chain [1] done processing
 77%|███████▋  | 82/106 [00:27<00:08,  2.91it/s]01:21:55 - cmdstanpy - INFO - Chain [1] start processing
01:21:55 - cmdstanpy - INFO - Chain [1] done processing
 78%|███████▊  | 83/106 [00:28<00:07,  2.88it/s]01:21:55 - cmdstanpy - INFO - Chain [1] start processing
01:21:55 - cmdstanpy - INFO - Chain [1] done processing
 79%|███████▉  | 84/106 [00:28<00:07,  2.96it/s]01:21:56 - cmdstanpy - INFO - Chain [1] start processing
01:21:56 - cmdstanpy - INFO - Chain [1] done processing
 80%|████████  | 85/106 [00:28<00:07,  2.93it/s]01:21:56 - cmdstanpy - INFO - Chain [1] start processing
01:21:56 - cmdstanpy - INFO - Chain [1] done processing
 81%|████████  | 86/106 [00:29<00:06,  2.95it/s]01:21:56 - cmdstanpy - INFO - Chain [1] start processing
01:21:56 - cmdstanpy - INFO - Chain [1] done processing
 82%|████████▏ | 87/106 [00:29<00:06,  3.01it/s]01:21:57 - cmdstanpy - INFO - Chain [1] start processing
01:21:57 - cmdstanpy - INFO - Chain [1] done processing
 83%|████████▎ | 88/106 [00:29<00:05,  3.04it/s]01:21:57 - cmdstanpy - INFO - Chain [1] start processing
01:21:57 - cmdstanpy - INFO - Chain [1] done processing
 84%|████████▍ | 89/106 [00:30<00:05,  3.01it/s]01:21:57 - cmdstanpy - INFO - Chain [1] start processing
01:21:57 - cmdstanpy - INFO - Chain [1] done processing
 85%|████████▍ | 90/106 [00:30<00:05,  3.00it/s]01:21:58 - cmdstanpy - INFO - Chain [1] start processing
01:21:58 - cmdstanpy - INFO - Chain [1] done processing
 86%|████████▌ | 91/106 [00:30<00:04,  3.02it/s]01:21:58 - cmdstanpy - INFO - Chain [1] start processing
01:21:58 - cmdstanpy - INFO - Chain [1] done processing
 87%|████████▋ | 92/106 [00:31<00:04,  3.01it/s]01:21:58 - cmdstanpy - INFO - Chain [1] start processing
01:21:58 - cmdstanpy - INFO - Chain [1] done processing
 88%|████████▊ | 93/106 [00:31<00:04,  2.99it/s]01:21:59 - cmdstanpy - INFO - Chain [1] start processing
01:21:59 - cmdstanpy - INFO - Chain [1] done processing
 89%|████████▊ | 94/106 [00:31<00:04,  2.96it/s]01:21:59 - cmdstanpy - INFO - Chain [1] start processing
01:21:59 - cmdstanpy - INFO - Chain [1] done processing
 90%|████████▉ | 95/106 [00:32<00:03,  2.93it/s]01:21:59 - cmdstanpy - INFO - Chain [1] start processing
01:22:00 - cmdstanpy - INFO - Chain [1] done processing
 91%|█████████ | 96/106 [00:32<00:03,  2.93it/s]01:22:00 - cmdstanpy - INFO - Chain [1] start processing
01:22:00 - cmdstanpy - INFO - Chain [1] done processing
 92%|█████████▏| 97/106 [00:32<00:03,  2.85it/s]01:22:00 - cmdstanpy - INFO - Chain [1] start processing
01:22:00 - cmdstanpy - INFO - Chain [1] done processing
 92%|█████████▏| 98/106 [00:33<00:02,  2.81it/s]01:22:01 - cmdstanpy - INFO - Chain [1] start processing
01:22:01 - cmdstanpy - INFO - Chain [1] done processing
 93%|█████████▎| 99/106 [00:33<00:02,  2.77it/s]01:22:01 - cmdstanpy - INFO - Chain [1] start processing
01:22:01 - cmdstanpy - INFO - Chain [1] done processing
 94%|█████████▍| 100/106 [00:33<00:02,  2.84it/s]01:22:01 - cmdstanpy - INFO - Chain [1] start processing
01:22:01 - cmdstanpy - INFO - Chain [1] done processing
 95%|█████████▌| 101/106 [00:34<00:01,  2.82it/s]01:22:02 - cmdstanpy - INFO - Chain [1] start processing
01:22:02 - cmdstanpy - INFO - Chain [1] done processing
 96%|█████████▌| 102/106 [00:34<00:01,  2.77it/s]01:22:02 - cmdstanpy - INFO - Chain [1] start processing
01:22:02 - cmdstanpy - INFO - Chain [1] done processing
 97%|█████████▋| 103/106 [00:35<00:01,  2.80it/s]01:22:02 - cmdstanpy - INFO - Chain [1] start processing
01:22:02 - cmdstanpy - INFO - Chain [1] done processing
 98%|█████████▊| 104/106 [00:35<00:00,  2.82it/s]01:22:03 - cmdstanpy - INFO - Chain [1] start processing
01:22:03 - cmdstanpy - INFO - Chain [1] done processing
 99%|█████████▉| 105/106 [00:35<00:00,  2.78it/s]01:22:03 - cmdstanpy - INFO - Chain [1] start processing
01:22:03 - cmdstanpy - INFO - Chain [1] done processing
100%|██████████| 106/106 [00:36<00:00,  2.93it/s]
Cross Validation Done

Performance Metrics

In [ ]:
df_train_pm = performance_metrics(cv)
In [ ]:
df_train_pm
Out[ ]:
horizon mse rmse mae mdape smape coverage
0 41 days 2.999285 1.731844 1.091885 1.008353 1.711028 0.558665
1 42 days 3.010573 1.735100 1.094608 1.009224 1.714091 0.557302
2 43 days 3.051772 1.746932 1.099157 1.008614 1.712342 0.557470
3 44 days 3.015368 1.736482 1.099211 1.009529 1.714617 0.553777
4 45 days 3.025132 1.739291 1.101770 1.010231 1.717723 0.552851
... ... ... ... ... ... ... ...
356 397 days 7.261271 2.694675 2.011761 0.998250 1.795379 0.291606
357 398 days 7.235686 2.689923 2.013692 0.998658 1.795402 0.289480
358 399 days 7.269176 2.696141 2.013672 0.998479 1.794511 0.291373
359 400 days 7.279526 2.698060 2.016175 0.998341 1.793715 0.289505
360 401 days 7.253374 2.693209 2.018154 0.998798 1.793873 0.286914

361 rows × 7 columns

MSE & RMSE¶

In [ ]:
MSE = sum(df_train_pm['mse'])/len(df_train_pm)
RMSE = sum(df_train_pm['rmse'])/len(df_train_pm)
MSE, RMSE
Out[ ]:
(5.607989691147735, 2.338448145623682)

Visualizing Performance Metrics

In [ ]:
# Assuming 'model' is your fitted Prophet model and 'prediction' is the forecast DataFrame
fig = plot_plotly(model, prediction)

# Extract changepoints
changepoints = model.changepoints

# Add changepoint lines to the plot
for changepoint in changepoints:
    fig.add_trace(go.Scatter(
        x=[changepoint, changepoint],
        y=[min(prediction['yhat_lower']), max(prediction['yhat_upper'])],
        mode='lines',
        line=dict(color='red', dash='dash'),
        name='Changepoint'
    ))

fig.show()

Inference for the above plot -¶

Changepoints: The red dashed lines represent changepoints where the Prophet model has detected possible shifts in the stock's trajectory. These could correspond to moments when the underlying trend in the daily price changes significantly altered.

Stock Movement: The dots represent actual daily price changes, with the dense clustering around zero and occasional spikes indicating days with larger changes.

Corporate Events: Sudden movements in stock prices, especially those coinciding with the red changepoint lines, could be linked to corporate events, significant news releases, or market shifts affecting Apple. For example, product launches, earnings reports, or broader market turbulence might align with these changepoints.

Interpretation and Research: To confirm whether these changepoints align with actual events, one would need to cross-reference these dates with historical news and corporate event data for Apple.

Predictive Insights: This visual analysis could provide predictive insights, allowing stakeholders to understand potential impacts of similar future events on stock volatility.

Legal and Regulatory News: Legal battles, such as Apple's antitrust cases or patent disputes, can affect investor sentiment and thus stock prices, as seen in past fluctuations during such events.

Economic Shifts: Broader economic shifts, like the 2020 market crash due to the COVID-19 pandemic, also had a notable impact on Apple’s stock, as it did with many others, reflecting global economic sentiment.

Stock price prediction using ARIMA -¶

In [ ]:
df = apple.copy()

# Filter the DataFrame for the date range 2015-01-01 to 2023-06-22
df = df['01-01-2015':'12-12-2022']

Testing for stationarity -¶

In [ ]:
test_result = adfuller(df['Close'])
In [ ]:
#H0: It is non stationary
#H1: It is stationary

def adfuller_test(close):
  result = adfuller(close)
  labels = ['ADF Test Statistics','p-value','#Lags Used','Number of Observations Used']
  for value,label in zip(result,labels):
    print(label+' : '+str(value))
  if result[1] <= 0.05:
    print("Strong evidence against the null hypothesis(H0), reject the null hypothesis,i.e. data is stationary")
  else:
    print("Weak evidence against the null hypothesis(H0), means accept the null hypothesis, i.e. data is non stationary")
In [ ]:
adfuller_test(df['Close'])
ADF Test Statistics : -0.3454253363823246
p-value : 0.9188198000791241
#Lags Used : 22
Number of Observations Used : 1978
Weak evidence against the null hypothesis(H0), means accept the null hypothesis, i.e. data is non stationary

The output from the Augmented Dickey-Fuller (ADF) test provides evidence on whether the Apple stock price data is stationary:

  • ADF Test Statistic: The value of -0.345 suggests that the test statistic is greater than the critical values, implying non-stationarity.
  • p-value: A high p-value of 0.919 indicates that the null hypothesis of non-stationarity cannot be rejected with a high degree of confidence.
  • Lags Used: The test has considered 22 lags in assessing the time series data.
  • Data Points: There are 1978 observations used for the test, providing a substantial dataset for the analysis.
  • Conclusion: Given the test statistic and p-value, there is weak evidence against the null hypothesis; thus, the time series is considered non-stationary, meaning it likely contains some form of trend or seasonality.

Differencing -¶

In [ ]:
df['Close_shift'] = df['Close'] - df['Close'].shift(1)
In [ ]:
df['Close_shift'] = df['Close_shift'].fillna(method='bfill')
df['Close_shift']
Out[ ]:
Date
2015-01-02   -0.770000
2015-01-05   -0.770000
2015-01-06    0.002501
2015-01-07    0.372499
2015-01-08    1.035000
                ...   
2022-12-06   -3.720001
2022-12-07   -1.970001
2022-12-08    1.709991
2022-12-09   -0.489990
2022-12-12    0.160004
Name: Close_shift, Length: 2001, dtype: float64
In [ ]:
test_result = adfuller(df['Close_shift'])
In [ ]:
#H0: It is non stationary
#H1: It is stationary

def adfuller_test(close):
  result = adfuller(close)
  labels = ['ADF Test Statistics','p-value','#Lags Used','Number of Observations Used']
  for value,label in zip(result,labels):
    print(label+' : '+str(value))
  if result[1] <= 0.05:
    print("Strong evidence against the null hypothesis(H0), reject the null hypothesis,i.e. data is stationary")
  else:
    print("Weak evidence against the null hypothesis(H0), means accept the null hypothesis, i.e. data is non stationary")
In [ ]:
adfuller_test(df['Close_shift'])
ADF Test Statistics : -9.374881871147709
p-value : 7.21200317852827e-16
#Lags Used : 21
Number of Observations Used : 1979
Strong evidence against the null hypothesis(H0), reject the null hypothesis,i.e. data is stationary

Plot after differencing -¶

In [ ]:
plt.figure(figsize=(16,8))
plt.plot(df['Close_shift'])
plt.title("APPLE First Difference")
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close Price USD ($)', fontsize=18)
plt.show()
No description has been provided for this image

Auto Regressive Model -¶

Thoughts on Autocorrelation and Partial Autocorrelation -¶

  1. Identification of an AR model is often best done with the PACF -- For an AR model, the theoretical PACF “shuts off” past the order of the model. The phrase “shuts off” means that in theory the partial autocorrelations are equal to 0 beyond that point. Put another way, the number of non-zero partial autocorrelations gives the order of the AR model. By the “order of the model” we mean the most extreme lag of x that is used as a predictor.

  2. Identification of an MA model is often best done with the ACF rather than the PACF. -- For an MA model, the theoretical PACF does not shut off, but instead tapers toward 0 in some manner. A clearer pattern for an MA model is in the ACF. The ACF will have non-zero autocorrelations only at lags involved in the model.

p,d,q. AR model - p, differencing - d, MA lags - q

In [ ]:
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(df['Close_shift'].iloc[1:],lags=40,ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(df['Close_shift'].iloc[1:],lags=40,ax=ax2)
No description has been provided for this image

Inference for the plot above -¶

ACF Plot: The autocorrelation plot shows a significant initial lag, which quickly diminishes, suggesting a short-term relationship in the first lag that does not persist.

PACF Plot: The partial autocorrelation plot also shows a significant spike at the first lag, indicating that there is a strong correlation with the immediate previous value, with no significant correlations in subsequent lags.

Splitting the data -¶

In [ ]:
train_data, test_data = df[0:int(len(df)*0.8)], df[int(len(df)*0.8):]
plt.figure(figsize=(16,8))
plt.title('Apple Prices')
plt.xlabel('Year')
plt.ylabel('Prices')
plt.plot(df['Close_shift'], label='Training Data')
plt.plot(test_data['Close_shift'], 'green', label='Testing Data')
# plt.xticks(np.arange(0,1857, 300), df['Date'][0:2000:300])
plt.legend()
Out[ ]:
<matplotlib.legend.Legend at 0x23dda877be0>
No description has been provided for this image
In [ ]:
#p=1, d=0, q=0 or 1
from statsmodels.tsa.arima.model import ARIMA
In [ ]:
# model=ARIMA(train_data['Close_shift'],order=(1,1,1))
model=ARIMA(df['Close_shift'],order=(1,0,1))
model_fit=model.fit()
In [ ]:
model_fit.summary()
Out[ ]:
SARIMAX Results
Dep. Variable: Close_shift No. Observations: 2001
Model: ARIMA(1, 0, 1) Log Likelihood -3974.312
Date: Fri, 08 Dec 2023 AIC 7956.625
Time: 01:22:06 BIC 7979.030
Sample: 0 HQIC 7964.851
- 2001
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
const 0.0572 0.036 1.592 0.111 -0.013 0.127
ar.L1 0.3952 0.149 2.647 0.008 0.103 0.688
ma.L1 -0.4603 0.146 -3.156 0.002 -0.746 -0.174
sigma2 3.1094 0.046 67.015 0.000 3.018 3.200
Ljung-Box (L1) (Q): 0.01 Jarque-Bera (JB): 4405.24
Prob(Q): 0.92 Prob(JB): 0.00
Heteroskedasticity (H): 41.33 Skew: -0.02
Prob(H) (two-sided): 0.00 Kurtosis: 10.27


Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

Residuals plot -¶

In [ ]:
from pandas import DataFrame
residuals = DataFrame(model_fit.resid)
plt.figure(figsize=(16,8))
#residuals.plot()
plt.plot(residuals)
Out[ ]:
[<matplotlib.lines.Line2D at 0x23de9fc7b50>]
No description has been provided for this image

Inference of the above plot -¶

Residual Behavior: The residuals are centered around zero, which is a good sign, indicating that the model does not systematically over or under predict.

Volatility: There is some apparent volatility in the residuals, with periods of greater fluctuation, particularly towards the end of the time series.

Randomness: The lack of a clear pattern in the residuals suggests that the model has captured most of the underlying structure of the data, leaving behind noise, which ideally should be random.

In [ ]:
residuals.plot(kind='kde')
Out[ ]:
<AxesSubplot:ylabel='Density'>
No description has been provided for this image
In [ ]:
residuals.describe()
Out[ ]:
0
count 2001.000000
mean -0.000065
std 1.763805
min -10.555294
25% -0.460171
50% -0.025454
75% 0.473118
max 11.629638
In [ ]:
train_arima = train_data['Close_shift'].values
test_arima = test_data['Close_shift'].values
In [ ]:
p_values = range(0,4)
d_values = range(0,3)
q_values = range(0,2)
In [ ]:
# Way-1
import itertools
pdq = list(itertools.product(p_values,d_values,q_values))
In [ ]:
for param in pdq:
 for i in range(len(test_arima)):
       try:
         model = ARIMA(train_arima,order=param)
         model_fit = model.fit(disp=0)
         pred_y = model_fit.forecast()[0]
         predictions.append(pred_y)
         error = mean_sqared_error(test_arima, predictions)
         print(model_fit.aic)
         print("ARIMA%s, MSE=0.2, RMSE=0.2"% (order,error,math.sqrt(error)))
       except:
         continue
In [ ]:
for p in p_values:
 for d in d_values:
   for q in q_values:
     order = (p,d,q)
     train, test = train_arima, test_arima
     predictions = list()
     for i in range(len(test_arima)):
       try:
         model = ARIMA(train_arima,order)
         model_fit = model.fit(disp=0)
         pred_y = model_fit.forecast()[0]
         predictions.append(pred_y)
         error = mean_sqared-error(test_arima, predictions)
         print("ARIMA%s, MSE=0.2, RMSE=0.2"% (order,error,math.sqrt(error)))
       except:
         continue
In [ ]:
history = [x for x in train_arima]
print(type(history))
predictions = list()
for t in range(len(test_arima)):
    model = ARIMA(history, order=(1,1,0))
    model_fit = model.fit()
    output = model_fit.predict(start=len(train_arima)+t-1, end = len(train_arima)+t,dynamic=True)
    #output = model_fit.forecast()
    yhat = output[0]
    predictions.append(yhat)
    obs = test_arima[t]
    history.append(obs)
    #print('predicted=%f, expected=%f' % (yhat, obs))
error = mean_squared_error(test_arima, predictions)
print('MSE : %.3f' % error)
error2 = math.sqrt(error)
print('RMSE : %.3f' % error2)
<class 'list'>
MSE : 14.100
RMSE : 3.755
In [ ]:
# Create a figure
fig = go.Figure()

# Add traces for training data, predicted prices, and actual prices
fig.add_trace(go.Scatter(x=df.index, y=df['Close_shift'], mode='lines', name='Training Data', line=dict(color='blue')))
fig.add_trace(go.Scatter(x=test_data.index, y=predictions, mode='markers+lines', name='Predicted Price', line=dict(color='green', dash='dash')))
fig.add_trace(go.Scatter(x=test_data.index, y=test_data['Close_shift'], mode='lines', name='Actual Price', line=dict(color='red')))

# Update layout for title and axes labels
fig.update_layout(title='Apple Prices Prediction',
                  xaxis_title='Dates',
                  yaxis_title='Prices',
                  legend_title="Legend")

# Show the plot
fig.show()

Inference for the above plot -¶

Training Data: The blue line represents the historical data used to train the ARIMA model. It seems stable without any visible trend, which is consistent with differenced or stationary data typically used in ARIMA modeling.

Predictions vs. Actuals: The green dotted line shows the predicted prices, while the red line shows the actual stock prices. The model predictions seem to capture the general volatility of the stock but do not align closely with the actual price movements, particularly in areas where there are sharp peaks or troughs.

Volatility: The model appears to have increasing difficulty predicting the price as time progresses, seen in the widening gap between the predicted and actual values, which could indicate increasing market volatility or model inadequacy for capturing complex patterns.

Predictive Capability: The predictive performance of the model might be adequate for very short-term forecasting but seems to deteriorate for longer-term predictions, which is typical given the complex and often chaotic nature of stock price movements.

Apple-Specific Movements: Any significant deviations between predicted and actual prices might correlate with Apple-specific events such as product launches, earnings reports, or market sentiment changes, which may not be fully captured by the ARIMA model.

In [ ]:
# Create a figure
fig = go.Figure()

# Add traces for predicted prices and actual prices
fig.add_trace(go.Scatter(x=test_data.index, y=predictions, mode='markers+lines', name='Predicted Price', 
                         line=dict(color='green', dash='dash'), marker=dict(color='green')))
fig.add_trace(go.Scatter(x=test_data.index, y=test_data['Close_shift'], mode='lines', name='Actual Price', 
                         line=dict(color='red')))

# Update layout for title and axes labels
fig.update_layout(title='Apple Prices Prediction',
                  xaxis_title='Dates',
                  yaxis_title='Prices',
                  legend_title="Legend")

# Show the plot
fig.show()

Inference for the above plot -¶

Close Tracking: The predicted prices (green) closely track the actual stock prices (red), indicating the model is responsive to changes in the stock's price on a day-to-day basis.

Short-Term Accuracy: The model appears to have short-term predictive accuracy as the predictions closely follow the actual price movements, which might be useful for short-term trading strategies.

Volatility Representation: Both lines exhibit volatility, which the model seems to capture well, reflecting the inherent variability in the stock market.

Predictive Challenges: Despite the close tracking, there are points where the predicted price deviates from the actual price, underscoring the difficulty in accurately forecasting stock prices.

  • This suggests that while the ARIMA model may provide valuable insights into short-term price movements, the unpredictable nature of the stock market still presents challenges for precise long-term forecasting.
In [ ]:
# Data
models = ['LSTM', 'Prophet', 'ARIMA']
mse_values = [4.841393899172917, 5.607989691147735, 14.1]
rmse_values = [2.2003167724609374, 2.338448145623682, 3.755]

# Create subplots for MSE and RMSE
fig, axes = plt.subplots(1, 2, figsize=(12, 6))
fig.suptitle('MSE and RMSE Comparison')

# Plot MSE
sns.barplot(x=models, y=mse_values, ax=axes[0])
axes[0].set_title('MSE Comparison')
axes[0].set_ylabel('MSE')

# Plot RMSE
sns.barplot(x=models, y=rmse_values, ax=axes[1])
axes[1].set_title('RMSE Comparison')
axes[1].set_ylabel('RMSE')

# Show the plots
plt.show()
No description has been provided for this image

Sectoral Analysis -

In [ ]:
# Read the CSV file into a DataFrame
microsoft = pd.read_csv('MSFT.csv')
apple = pd.read_csv('AAPL.csv')
nvidia = pd.read_csv('NVDA.csv')

# Convert the 'Date' column to datetime
apple['Date'] = pd.to_datetime(apple['Date'], format='%d-%m-%Y')
microsoft['Date'] = pd.to_datetime(microsoft['Date'], format='%d-%m-%Y')
nvidia['Date'] = pd.to_datetime(nvidia['Date'], format='%d-%m-%Y')

# Set the 'Date' column as the index
apple.set_index('Date', inplace=True)
microsoft.set_index('Date', inplace=True)
nvidia.set_index('Date', inplace=True)

tech_df1 = apple.copy()
tech_df2 = microsoft.copy()
tech_df3 = nvidia.copy()

# Filter the DataFrame for the date range 2015-01-01 to 2023-06-22
tech_df1 = tech_df1['01-01-2015':'12-12-2022']
tech_df2 = tech_df2['01-01-2015':'12-12-2022']
tech_df3 = tech_df3['01-01-2015':'12-12-2022']
In [ ]:
tech_df1.reset_index(inplace=True)
tech_df2.reset_index(inplace=True)
tech_df3.reset_index(inplace=True)
In [ ]:
# Initialize a Prophet model for each stock
tech_prophets = [Prophet(), Prophet(), Prophet()]
tech_dfs = [tech_df1, tech_df2, tech_df3]  # Replace with actual dataframes
company_names = ['Apple', 'Microsoft', 'Nvidia']  # Names for the legend
predictions = []

# Create subplots: one for line graph, one for bar chart
fig = make_subplots(rows=2, cols=1, subplot_titles=('Stock Predictions', 'Average Predicted Closing Prices'))

# Fit the model and make future predictions for each stock
avg_predicted_prices = []
for model, df, company_name in zip(tech_prophets, tech_dfs, company_names):
    # Prepare the DataFrame for Prophet
    df = df.rename(columns={'Date': 'ds', 'Close': 'y'})
    
    # Fit the model
    model.fit(df)
    
    # Create a future DataFrame for predictions
    future = model.make_future_dataframe(periods=90)
    
    # Predict and store the forecast
    forecast = model.predict(future)
    predictions.append(forecast['yhat'])
    
    # Add traces for actual and predicted values to the line graph
    fig.add_trace(go.Scatter(x=df['ds'], y=df['y'], name=f'{company_name} Actual', mode='lines'), row=1, col=1)
    fig.add_trace(go.Scatter(x=forecast['ds'], y=forecast['yhat'], name=f'{company_name} Predicted', mode='lines+markers'), row=1, col=1)
    
    # Calculate the average predicted price for the recent period and append to list
    avg_predicted_prices.append(forecast['yhat'].iloc[-90:].mean())

# Add bar chart for average predicted closing prices
fig.add_trace(go.Bar(x=company_names, y=avg_predicted_prices, name='Avg Predicted Price'), row=2, col=1)

# Customize the layout of the figure
fig.update_layout(height=800, title_text='Technology Sector Stock Analysis')

# Show the figure
fig.show()
01:24:45 - cmdstanpy - INFO - Chain [1] start processing
01:24:46 - cmdstanpy - INFO - Chain [1] done processing
01:24:47 - cmdstanpy - INFO - Chain [1] start processing
01:24:47 - cmdstanpy - INFO - Chain [1] done processing
01:24:48 - cmdstanpy - INFO - Chain [1] start processing
01:24:48 - cmdstanpy - INFO - Chain [1] done processing

Inference for the above line plot -¶

Sector Growth: The overall uptrend in the actual prices of Apple, Microsoft, and Nvidia reflects the robust growth within the Technology sector, indicative of innovation, market expansion, and increasing consumer and enterprise demand for technology solutions.

Sector Predictions: The predictions across the three companies, while diverging in individual cases, collectively demonstrate the model's attempt to capture sector-wide trends. The average predicted price line suggests an aggregated forecast, smoothing out individual stock volatility and providing a broader perspective on the Technology sector's trajectory.

Impact of Sector-Wide Events:

Product Innovation and Releases: New product launches, such as Apple's iPhone models or Nvidia's graphics cards, typically generate consumer excitement and can significantly affect stock prices within the sector. Technological Advancements: Breakthroughs in cloud computing, AI, and other emerging technologies, often spearheaded by companies like Microsoft and Nvidia, can lead to increased investment and stock valuation. Market Sentiment: The sector is also subject to market sentiment driven by tech-specific and macroeconomic news, regulatory changes, and shifts in consumer behavior, all of which can cause the kind of volatility observed in the plot. Divergence and Corrections: The instances where predictions and actual prices diverge may correspond to unforeseen market shocks or events that disrupt the usual business operations, such as supply chain issues or global economic instability.

Sectoral Analysis Utility: This kind of visualization is particularly valuable for sectoral analysis as it encapsulates not just the performance of individual companies but also the collective health of the Technology sector. It provides a visual benchmark for the sector's performance and helps identify periods of outperformance or underperformance relative to the market.

Inference for the above bar plot -¶

The attached bar chart represents the average predicted closing prices for stocks from three key players in the Technology sector: Apple, Microsoft, and Nvidia. The bar heights indicate that Microsoft's average predicted closing price is the highest among the three, suggesting that, according to the model's predictions, Microsoft may outperform Apple and Nvidia in terms of stock price over the forecasted period. Apple's predicted closing price is the lowest, with Nvidia's falling in between the two. This visualization can serve as a simplified comparative tool for potential future performance within the sector, based on the models' forecasts.

END OF NOTEBOOK | THANK YOU !!!¶